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Abstract This paper addresses some trust-region methods equipped with nonmonotone strategies for 
solving nonlinear unconstrained optimization problems. More specifically, the importance of using non¬ 
monotone techniques in nonlinear optimization is motivated, then two new nonmonotone terms are pro¬ 
posed, and their combinations into the traditional trust-region framework are studied. The global conver¬ 
gence to first- and second-order stationary points and local superlinear and quadratic convergence rates 
for both algorithms are established. Numerical experiments on the CUTEst test collection of unconstrained 
problems and some highly nonlinear test functions are reported, where a comparison among state-of-the- 
art nonmonotone trust-region methods show the efficiency of the proposed nonmonotne schemes. 
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1 Introduction 

In this paper we consider the unconstrained minimization problem 

minimize f(x) ... 

subject to ' ' 

where / : K” —> R. is a real-valued nonlinear function, which is bounded and continuously-differentiable. 
We suppose that first- or second-order black-box oracle of / is available. 

Motivation &: history. Trust-region methods, also called restricted step methods [21], are a class of 
iterative schemes developed to solve convex or nonconvex optimization problems, see, for example, jl3j . 
They also developed for nonsmooth problems, see [13 na 03 ng. Trust-region methods have strong 
convergence properties, are reliable and robust in computation, and can handle ill-conditioned problems, 
cf. [341 135] . Let Xk be the current iteration. In trust-region framework the objective / is approximated by 
a simple model in a specific region around Xk such that it is an acceptable approximation of the original 
objective, which is called region of trust. Afterward, the model is minimized subject to the trust-region 
constraint to find a new trial point dk■ Hence the simple model means that it can be minimized much 
easier than the original objective function. If the founded model is an adequate approximation of the 
objective function within the trust-region, then the point Xk+i = Xk + dk is accepted by the trust-region 
method and the region can be expanded for the next iteration; conversely, if the approximation is poor, 
the region is contracted and the model is minimized within the contracted region. This scheme will be 
continued until finding an acceptable trial step dk guaranteeing an acceptable agreement between the 
model and the objective function. 
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Several quadratic and non-quadratic models have been proposed to approximate the objective function 
in optimization, see [1411221 l36ll39j . however, the conic and quadratic models are more popular, see mm- 
If the approximated model is quadratic, i.e., 

qk(d) := f k +g k d+^d T B k d, ( 2 ) 

where f k = f(x k ), g k = ^ f{x k ) 1 and B k « V 2 f(x k ), the trust-region method can be considered as a 
globally convergent generalization on classical Newton’s method. Then the trust-region sunproblem is 
defined by 

minimize q k (d ), 

subject to ||d|| < S k . ' ' 

Hence the trust-region is commonly a norm ball C defined by 


C:={deR n \ ||d|| <$*}, 


where S k > 0 is a real number called trust-region radius, and || ■ || is any norm in R n , cf. [29] . Since C is 
compact and the model is continuous, the trust-region subproblem attains its minimizer on the set C. The 
most computational cost of trust-region methods relates to minimizing the model over the trust-region 
C. Hence finding efficient schemes for solving (|3| has received much attention during past few decades, 
see pa EDI E3 ED E9- Once the step d is computed, the quality of the model in the trust-region is 
evaluated by a ratio of the actual reduction of objective, f k — f(x k + d), to the predicted reduction of 
model, q k ( 0) - q k {d), i.e., 


= fk - f(x k +d) ' 

<7fc(0) - q k (d) 

For a prescribed positive constant £ (0,1], if r k > /zi, the model provides a reasonable approximation, 
the step is accepted, i.e., x k +i = x k + d k , and the trust-region C can be expanded for the next step. 
Otherwise, the trust-region C should be contracted by decreasing the radius S k and the subproblem (J3| 
is solved in the reduced region. This scheme is continued until that the step d accepted by trust-region 
test r k > fii. Our discussion can be summarized in the following algorithm: 


Algorithm 1: TTR (traditional trust-region algorithm) 

Input: XQ G R n , Bo G R" x ”, k m ax', 0 < /ii < r-2 < 1, 0 < pi < 1 < p 2 , £ > 0; 

Output: 

3'b'i fb'i 

l begin 


2 

<5o IIpoII 5 0; 

3 

while llgdl > e & k < k ma x do 

4 


solve the subproblem (|3| to specify 

5 


x k <- Xk + <4; compute f(x k )\ 

6 


determine rk using (|4|; 

7 


while T'k < /ii do 

8 



4 pi<5fc; 

9 



solve the subproblem |3l to specify d 

10 



Xk 4 — + d^-, compute f(xk) 

11 



determine r*. using d4|; 

12 


end 

13 


x k-\- 1 ^ 3'k'i 

14 


if r k ^ A ^2 then 

15 



<5fc+l P2&k\ 

16 


end 

17 


update Bk+ 1 ; k k + 1; 

18 

end 


19 

Xb <— Xk\ fb fk 5 

20 end 




In Algorithm 1, it follows from r k > gi and q k { 0 ) — q k {d k ) > 0 that 

fk - fk+i > Mife(0) - qk(dk)) > 0, 

implying f k +i < f k . This means that the sequence of function values {fk} is monotonically decreasing, 
i.e., the traditional trust-region method is also called the monotone trust-region method. This feature 
seems natural for minimization schemes, however, it slows down the convergence of TTR to a minimizer 
if the objective involves a curved narrow valley, see BET]. To observe the effect of nonmonotonicity on 
TTR, we study the next example. 
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Example 1 Consider the two-dimensional Nesterov-Chebysheve-Rosenbrock function , cf. 1 28 ) /■ 

f(xi,x 2 ) = ^Oi - l) 2 + (x 2 - 2xl + 1) 2 j 

where we solve the problem 0 by Newton’s method and TTR with the initial point xq = (—0.61,—1). It 
is clear that (1,1) is the optimizer. The implementation indicates that Newton’s method needs 1 iterations 
and 8 function evaluations, while monotone trust-region method needs 22 iterations and 2f function 
evaluations. We depict the contour plot of the objective and iterations as well as a diagram for function 
values versus iteration attained by these two algorithms in F%ure[I| Subfigure (a) of Figure [7] shows that 
the iterations of TTR follow the bottom of the valley in contrast to those for Newton’s method that can 
go up and down to reach the e-solution with the accuracy parameter e — 10~ 5 . We see that Newton’s 
method attains larger step compared with those of TTR. Subfigure (b) of Figure [7] illustrates function 
values versus iterations for both algorithms showing that the related function values of TTR decreases 
monotonically, while it is fluctuated nonmonotonically for Newton s method. 




(a) Nes-Cheb-Rosen contour plot Sz iterations (b) function values versus iterations 

Fig. 1: A comparison between Newton’s method and TTR: Subfigure (a) illustrates the contour plot 
of the two-dimensional Nesterov-Chebysheve-Rosenbrock function and iterations of Newton and TTR; 
Subfigure (b) shows the diagram of function values versus iterations. 


In general the monotonicity may result to the slow iterative schemes for highly nonlinear or badly- 
scaled problems. To avoiding this algorithmic limitation, the idea of nonmonotone strategies has been 
proposed traced back to the watch-dog technique to overcome the Martos effect for constrained optimiza¬ 
tion p2]. To improve the performance of Armijo’s line search, Grippo et al. in 1986 m proposed the 
modified Armijo’s rule 

f(x k + a k d k ) < + aa k g k d k , k = 0 , 1 , 2 , • • • , 

with the step-size a k > 0, cr G (0,1/2), and 

fm = ( 5 ) 

0 <j <m(k) 

where m(0) = 0, m(k) < min{m(/c — 1) + 1, N} for nonnegetive integer N. It was shown that the 
associated scheme is globally convergent, and numerical results reported in Grippo et al. m and Toint 
m showed the effectiveness of the proposed idea. Motivated by these results, the nonmonotone strategies 
has received much attention during past few decades. For example, in 2004, Zhang & Hager in [3HJ 
proposed the nonmonotone term 

(fo if k = 0, = j 1 if k = 0, 

\ (Vk-iQk-iCk-i + f{xk))/Qk if k > 1, \ 77 fc _iQfe_i + 1 if k > 1, 


C k = 


( 6 ) 
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where 0 < rjmin < Vk-i < Vmax < 1- Recently, Mo et al. in [30] and Ahookhosh et al. in [3J studied the 
nonmonotone term 


f fk if k = 1, 

\ VkD k -i + (1 - T] k )fk if k > 2 


where r) k £ [ Vmin ^ max ], Vmin € [0,1], rj max £ \ j]min , 1] ■ More recently, Amini et al. in [7] proposed the 
nonmonotone term 


Rk — 'Hkfl(k) (1 Vk)fk-> 


( 8 ) 


where 0 < r) m i n < ri max < 1 and r/k £ ['dmin • Vmax ]• In all cases it was proved that the schemes are globally 
convergent and enjoy the better performance compared with monotone ones. 

At the same importance of using monmonotone strategies for inexact line search techniques, the 
combination of trust-region methods with nonmonotone strategies is interesting. Historically, the first 
nonmonotone trust-region method was proposed in 1993 by Deng et al. in m for unconstrained opti¬ 
mization. Under some classical assumptions, the global convergence and the local superlinear convergence 
rate were established. Nonmonotone trust-region methods were also studied by several authors such as 
Toint [41], Xiao & Zho [43], Xiao & Chu [44] . Zhou & Xiao [47] , Ahookhosh & Amini [2], Amini 
& Ahookhosh [B], and Mo et al. m■ Recently, Ahookhosh & Amini in [I] and Ahookhosh et 
al. in [|] proposed two nonmonotone trust-region methods using the nonmonotone term ([8]). Theoretical 
results were reported, and numerical results showed the efficiency of the proposed nonmonotone methods. 


Content. In this paper we propose a trust-region method equipped with two novel nonmonotone terms. 
More precisely, we first establish two nonmonotone terms and then combine them with Algorithm 1 to 
construct two nonmonotone trust-region algorithms. If k > N, the new nonmonotone terms are defined 
by a convex combination of the last N successful function values, and if k < N, either a convex combi¬ 
nation of k successful function values or fi(k) is used. The global convergence to first- and second-order 
stationary points is established on some classical assumptions. Moreover, local superlinear and quadratic 
convergence rates for the proposed methods are studied. Numerical results regarding experiments on some 
highly nonlinear problems and on 112 unconstrained test problems from the CUTEst test collection [24] 
are reported indicating the efficiency of the proposed nonmonotone terms. 

The remainder of paper is organized as follow. In Section 2 we propose new nonmonotone terms and 
their combination with the trust-region framework. The global convergence of the proposed methods are 
given in Section 3. Numerical results are reported in Section 4. Finally, some conclusions are given in 
Section 5. 


2 Novel nonmonotone terms and algorithm 

In this section we first present two novel nonmonotone terms and then combine them into trust-region 
framework to introduce two nonmonotone trust-region algorithms for solving the unconstrained optimiza¬ 
tion problem ([l]). 

We first assume that k denotes the current iteration and N £ N is a constant. The main idea is to 
construct a nonmonotone term determined by a convex combination of the last k successful function 
values if k < N and by a convex combination of the last N successful function values if k > N. In the 
other words, we construct new terms using function values collected in the set 

T , = / {/o, /i, - - - ,/fc} if k<N, 

k ~ \ {fk-N+1, fk-N+2, ■ • • , fk} if k > N, 

which should be updated in each iteration. To this end, motivated by the term 
using the subsequent procedure 

To =f 0 

T\ = (1 - Vo)fi + riofo 

t 2 = (i - m)f 2 + mi 1 - vo)h + mvofo 

< 

T N - 1 = (1 — I?AT-2)/aT-1 + VN- 2(1 — W-3)/m-2 H-+ VN -2 • ’ ’ Vofo 

. Tn = (1 - VN-l)fN + — VN-2)fN-l 4-+ VN-1 ’ • ’ Vofo 


(9) 


(401, we construct T k 


if k = 0, 
if k = 1, 
if k = 2. 


if k = N — 1, 
if k = N, 
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where r]i £ [0,1), for i = 1,2,--- , TV, are some weight parameters. Hence the new term is generated by 
0--r)k-i)fk+Vk-iTk-i if k < N, 


T k := 


(1 - r]k-i) fk + Vk- i(l - Vk- 2 ) fk-i H - 1- Vk-i ■ • • Vk-N fk-N if k > TV, 


( 10 ) 


where T 0 = f 0 and rji £ [0,1) for i = 1, 2, • • • , k. To show that T k is a convex combination of the collected 
function values T k , it is enough to show that the summation of multipliers are equal to unity. For k > TV, 
the definition for T k implies 


(1 - Vk-l) + Vk —1 (1 - Vk- 2 ) H-h Vk-l ■ ■ ■ - Vk- at) + Vk- 1 ' • • Vk-N = 1 (11) 


For fc < TV, a similar summation of the last k multipliers is equal to one. Therefore, the generated term 
Tk is a convex combination of the elements of Tk ■ 

The procedure of defining Tk clearly implies that the set Tk should be updated and saved in each 
iteration. Moreover, TV(TV + l)/2 multiplications is required to compute Tk■ To avoid saving Tk and 
decrease the number of multiplications, we derive a recursive formula for (10). From the definition oiT k , 
for k > TV, it follows that 


Tk — Vk-lTk-l = (1 - Vk- 1) fk + ?7fc-l(l — Vk- 2 ) fk- 1 + • • • + Vk-1 ■ ■ ■ Vk-N fk-N 

- Vk- l(l — Vk- 2 ) fk -1 - ■ • ■ — Vk -1 •■•(! — Vk-N-l) fk-N ~ Vk-lVk-2 ’ ’ ’ Vk-N-1 fk-N -1 
= (1 — Vk- 1) fk + Vk-lVk-2 ■ ■ ■ Vk-N-1 ( fk-N ~ fk-N-1 ) 

= (1 - Vk- 1) fk + fk ( fk-N - fk-N-1 ) 


where fk '■= Vk-iVk -2 • • • Vk-N-i ■ For k > N, this equation leads to 

Tk = (1 — Vk- 1) fk + Vk-lTk-l + fk ( fk-N ~ fk-N-l), 


( 12 ) 


which require to save only fk-N and fk-N -1 and only needs three multiplications. Moreover, the definition 
of fk implies 


fk = Vk-lVk-2 ■ ■ ■ Vk-N-1 


Vk-1 Vk-1 f 

- Vk-2Vk-3 • • • Vk-N-2 — - fk- 1 - 

Vk—N—2 Vk-N-2 


If fk is recursively updated by (13), (10), and (121, a new nonmonotone term is defined by 


(fk+ Vk-i{T k - fk) if A: < TV, 
\ max {T kl fk} if k > TV, 


(13) 


(14) 


where the max term is added to guarantee T k > fk ■ 

As discussed in Section 1, nonmonotone schemes perform better when they use stronger nonmonotone 
terms far away from the optimizer and weaker one close to it. This motivate us to consider a new version of 


the derived nonmonotone term by using fi(k) in cases that k < TV. More precisely, the second nonmonotone 
term is defined by 


T f fi(k) _ if k < TV, 

k \ max {T k ,f k } if k > TV, 


(15) 


where fk is defined by ([13] ). It is clear that the new term uses a stronger term fi( k ) defined by ([5]) for first 
k < TV iterations and then employs the relaxed convex term proposed above. 

Now, to employ the proposed nonmonotone terms in the trust-region framework, it is enough to 
replace the ratio r k Q by the nonmonotone ratio 


T k - f{x k + d) 

<7fc(0) - g fe (cT) 


(16) 


where T k is defined by (14) or (15). Hence in trust-region framework we replace 0 by |I6| ). Notice that 
if f k > Hi, the, 

T k - fk+i > Vi{Vk( 0) - q k (dk)) > 0. 

This implies that f k +i can be larger than f k , however, the elements of {f k } cannot arbitrarily increase, 
and the maximum increase is controlled by the nonmonotone term T k . Moreover, the definitions (14) 
and (15) imply that f k > r k increasing the possibility of attaining larger steps for nonmonotone schemes 
compared with monotone ones. 
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The above-mentioned discussion leads to the following nonmonotone trust-region algorithm: 


Algorithm 2: NMTR (nonmonotone traditional trust-region algorithm) 


Input: Xo £ R n , B 0 £ 

Output: x b ; f b ; 
begin 

<5o <— ||<?o||; k a- 0; 
while ||< 7 fc|| > e & 

solve the subproblem <[3j) to specify d k \ 
Xk t- x k + <4; compute f(x k ); 
determine r k using (16); 


0 < Mi < ^2 > 1, 0 < pi < 1 < p 2 > 1, e > 0; 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 end 


k ^ k max do 


if ffc < pi while f k < mi do 
t— Pi(5fe; 

solve the subproblem |3j) to specify d k ; 
Sfe t- Xfe + <4; compute f(x k ) 
determine r k using (16); 

end 

a^fc+i a:fc; 

if x k > M 2 then 

| <5/c+i •<— P2<5fc; 

end 

update Rfc+i; fc t— A: + 1; 
update Tfc+i; 
update jjfc+i; 


end 

X b <r~ 


x k \ fb t— fk ; 


In Algorithm 2, if r k > Mi (Line 7), it is called a successful iteration and if r k > M 2 (Line 14), it is 
called a very successful iteration. In addition, in the algorithm, the loop started from Line 3 to Line 
20 is called the outer cycle , and the loop started from Line 7 to Line 12 is called the inner cycle. 


3 Convergence analysis 

This section concerns with the global convergence to first- and second-order stationary points of the 
sequence {a^} generated by Algorithm 2. More precisely, we intend to prove that all limit point x* of 
the sequence {a:*,} satisfy the condition g(x*) = 0, and there exists a point x* satisfying g(x*) = 0 where 
H(x*) is positive semidefinite. Furthermore, we show that Algorithm 2 is well-defined, which means that 
the inner cycle of the algorithm will be leaved after a finite number internal iterations, and then prove its 
global convergence. Moreover, local superlinear and quadratic convergence rates are investigated under 
some classical assumptions. 

To prove the global convergence of the sequence {x k } generated by Algorithm 2, we require to make 
the following assumptions: 

(HI) The objective function / is continuously differentiable and has a lower bound on the upper level 
set L(x 0 ) = {ieR n / (x) < f(x 0 )}. 

(H2) The sequence {B k } is uniformly bounded, i.e., there exists a constant M > 0 such that 

\\B k \\<M, 


for all k £ N. 

(H3) There exists a constant c > 0 such that the trial step d k satisfies ||c4|| < c||yfc||. 
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We also assume that the decrease on the model q k is at least as much as a fraction of the decrease 
obtained by the Cauchy point guaranteeing that there exists a constant /3 £ (0,1) such that 


qk( o) - qk(dk) > /3\\gk\\ min < 4 


; JM\ 


\B k \\J 


(17) 


for all k. This condition is called the sufficient reduction condition. Inequality ( |17| ) implies that dk ^ 0 
whenever < 7 fe 7 ^ 0. It is noticeable that there are several schemes that can solve the the trust-region 
subproblem ([ 3 ]) such that (18) is valid, see, for example, [111311- 

Lemma 2 Suppose that sequence {xk} is generated by Algorithm 2, then 

| fk - f( x k + d k ) - (q k ( 0 ) - qk(dk))\ < 0 (|| 4 || 2 ). 


Proof The proof can be found in QU. 

Lemma 3 Suppose that the sequence {xk} is generated by Algorithm 1, then we get 

fk <T k < fl(k )> 

for all k £ N U {0}. 


□ 


(18) 


Proof For fc < IV, we consider two cases: (i) Tk is defined by (14); (ii) Tk is defined by (15). In Case (i) 
Lemma 2.1 in [3], fi < fu k )i for * = 0,1, • • • k, and the fact that summation of multipliers in Tk equal to 
one give the result. Case (ii) is evident from (151. 

For k > N, if Tk = fk, the result is evident. Otherwise, since 

(1 - rjk-i) + Vk-i(l - Vk- 2 ) H-1- Vk-i ■ ■■r)k-N- i(l - Vk-N) + Vk- 1 • -Vk-N = 1, (19) 


the fact that fi < fi( k ), for i = k — N + 1, • • • ,k, and (10) imply 

fk <Tk = {1 — Vk- 1) fk + Vk- l(l — Vk-2) fk- 1 H-+ Vk- 1 ' ’ ’ Vk-N fk-N 

< [(1 - Vk-i) +r?fc-i(l - Vk- 2 ) H- \~Vk-i ■ ■■Vk-N\fi(k) = fi(k), 

giving the result. 


□ 


Lemma 4 Suppose that sequence {ifc} is generated by Algorithm 2, then the sequence {fi(k)} decreas¬ 
ing. 


Proof The condition (18) implies that Tk < fpk)- If x k+ 1 is accepted by Algorithm 2, then 


leading to 
implying 


fi(k) ^ f(x k + d k ) T k - f(x k + d k ) 
qk(0)-q k (d k ) ~ qk(0)-q k (d k ) “ /U ' 

fi{k) - f{ x k + d k ) > Mi(gfc(0) - q k (dk)) > 0, for all k G N, 

fi(k) > fk+ 1 , for all k e N. 


( 20 ) 


Now, if k > N, by using m(k + 1) < m(k) + 1 and (20), we get 

fi(k+ 1) = max {fk-j+i} < max {f k -j+ 1 } = max{/ 1(t) , f k+1 } < f m - 

For k < N, it is obvious that m{k) = k. Since, for any k , f k < /o, it is clear that fp k ) = fo- Therefore, 
in both cases, the sequence {fi( k )} is decreasing. □ 

Lemma 5 Suppose that (HI) holds and the sequence {cc^} is generated by Algorithm 2, then L(xq) 
involves { Xk }• 

Proof The dehnition of T k indicates that Tq = fo- By induction, we assume that Xi £ L(xq), for all 
i = 1, 2, • • • , k, and then prove that x k +i £ L(x o). From we get 


fk +1 < Tk+i < fl(k+ 1) < fl(k) < fo, 
implying that L(x o) involves the sequence {cc/d. 


□ 
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Corollary 6 Suppose that (HI) holds and the sequence {xk} is generated by Algorithm 2. Then the 
sequence {/;(&)} is convergent. 

Proof The assumption (HI) and Lemma [4] imply that there exists a constant A such that 

A < fk+n < fl(k+n ) < ’ ’ ’ < fl(k + 1) < fl(k), 
for all n £ N. This implies that the sequence {/;(&)} is convergent. 

Lemma 7 Suppose that (H1)-(H3) hold and the sequence {x k } is generated by Algorithm 2, then 


lim f(x m ) = lim f k . 

k—>oo k—too 


Proof The condition (18) and Lemma 7 of [I] imply that the result is valid. 

Corollary 8 Suppose (H1)-(H3) hold and the sequence {x k } is generated by Algorithm 2, then we 

lim T k = lim f k . 

k—> oo k—> oo 

Proof From (18) and Lemma [7J the result is obtained. 


□ 

( 21 ) 

□ 

( 22 ) 

□ 


Lemma 9 Suppose that (HI) and (H2) hold, and the sequence {a^} is generated by Algorithm 2. Then 
if llllfell > £ > 0, we have 

(i) The inner cycle of Algorithm 2 is well-defined; 

(ii) For any k, there exists a nonnegative integer p such that x k + p +i is a very successful iteration. 

Proof (i) Let t k denotes the internal iteration counter in step k , and dt k and 5]f respectively show the 
solution of the subproblem ^ and the corresponding trust-region radius in the internal iteration t k . The 
fact that ||< 7 fe|| > e > 0, (H2), and ( [l7| imply 

9k(0) - q k (dl k ) > P\\g k \\ min |fr| | > min { S k k > Jj] ■ ( 23 ) 

Then Line 8 of Algorithm 2 implies 


lim S[ k = 0. 

tfc —>-o° 


From This, Lemma |2j and (24), we obtain 

fk - f(x k + dl k ) 


Vk - 1 | = 


-1 


fk - f(x k +d k k )~ {q k { o) - q k {d k )) 


< 


9fc(0) - q k (d tk ) 

o(H4i 2 ) 

fie min — fie min {5{ k ,e/M} 


9fe(0) - q k (d tk ) 


< 


om?) 


—t 0 ( t k —> oo), 


implying that there exists a positive integer fco such that for k > ko we have r k > p-\. This and (18) lead 
to 

T k - f(x k +d[ k ) > f k - f(x k + d[ k ) 


r k = 


q k ( 0) - q k (d k ) 


> IH, 


q k ( 0) - q k (dl k ) 

implying that the inner cycle is well-defined. 

(ii) Assume that there exists a positive integer k such that for an arbitrary positive integer p the point 
Xfc +p+ i is not very successful. Hence, for any constant p = 0,1,2, • ■ ■, we get 

r k+p < /i 2 ■ 

The fact that ||gfc|| > £ > 0, (H2), and ( fl7| ) imply 


d^k+p f (x k -j-p F d k + p ) A pi(q k -\- p (0) q k + p (d k + p ')) /5/ri||y^_)_p|| min < 5 k + p , 


II 9k+p 


B, 


> Pme minjjfc+p, . 


k+p | 


(24) 
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By using (22) and (24), we can write 


From Lemma !©> and (|23|), we obtain 


lim Sk+p = 0. 

p—> OO 


Vk+p -1| 


f(%k+p ) / [.Xk+p T dk+p) 

dk+p(0) Qk+pi^k+p) 

f f(,Xk-\-p T 4+p) (*7/e-t-p(0) Qk-\-p{dk-\-p')') 

Qk+p(Q) Qk+p{^k+p) 


Q(ll4+p|| 2 ) < O^l+p) 

/3e min {6 k+p ,e/M} ~ /3e min{4 +p ,e/M} 


(p —> oo). 


Then, for a sufficiently large p, we get r^+p > /r 2 leading to 


(25) 


Tk+p f(%k+p T dk+p) ^ /(*^fc+p) /(^Hp f ^ 

Q fc+p(0) Qk+p[dk+p) dk+pifd) Qk+p(dk-\-p) 

implying rk+ p > p, 2 , for a sufficiently large p. This contradicts with assumption rk+p < /r 2 giving the 
result. □ 


Lemma^i) implies that the inner cycle will be leaved after a finite number of internal iterations, and 
Lemma J9|n) implies that if the current iteration is not a first-order stationary point, then at least there 
exists a very successful iteration point, i.e., the trust-region radius 5k can be enlarged. The next result 
gives the global convergence of the sequence {xk} of Algorithm 2. 


Theorem 10 Suppose that (HI) and (H2) hold, and suppose the sequence {xk} is generated by Algorithm 
2. Then 


lim inf ||< 7 fc|| = 0. (26) 

k—> oo 


Proof We consider two cases: (i) Algorithm 2 has finitely many very successful iterations; (ii) Algorithm 
2 has infinitely many very successful iterations. 

In Case 1, we suppose that k 0 be the largest index of very successful iterations. If ||gfe 0+ i|| > 0, then 
Lemma [9|n) implies that there exist a very successful iteration with larger index than kg. This is a 
contradiction to the definition of ho- 

In Case 2, by contradiction, we assume that there exist constants e > 0 and K > 0 such that 


Ikll > e, (27) 

for all k > K. If Xk+i is a successful iteration and k > K, then by using (H2), and ( [27| , we get 
Tk - f(x k + d k ) > !ti(<7fc(0) - q k (d k )) 

llfffcll 1 ^ „. . • f c ( 28 ) 


(29) 


> Ppi\\g k \\ min|4, py- j > (d^e min { 4 , jj} > 0. 


It follows from this inequality and (22) that 


lim <5fc = 0. 

k—too 


Since Algorithm 2 has infinitely many very successful iterations, then Lemma |9]di) and ( |27| ) imply that 
the sequence {a;*.} involves infinitely many very successful iterations in which the trust-region is enlarged, 
which is a contradiction with (29). This implies the result is valid. □ 

Theorem 11 Suppose that (HI) and (H2) hold, and the sequence {xk} is generated by Algorithm 2. 
Then 

lim \\g k \\ = 0. 

k—> oo 


Moreover, there is no limit point of the sequence {xk} to be a local maximizer of f. 


( 30 ) 
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Proof By contradiction, we assume lim^oo ||gfc|| ^ 0. Hence there exists e > 0 and an infinite subsequence 
of {xk}, indexed by {ti}, such that 

||0t 4 ||>2e>O, (31) 


for all * G N. Theorem 10 ensures the existence, for each ti , a first successful iteration r(f,) > t-i such that 
|| 3 r ( ti )|| < e. We denote r* = r(U). Hence there exists another subsequence, indexed by {?"<}, such that 


||fffc||>£ for ti < k < Ti, HsvJ < £. 


(32) 


We now restrict our attention to the sequence of successful iterations whose indices are in the set 


= {k G N | ti < k < Ti}. 


Using (321, for every k G re, (28) holds. It follows from (22) and (|28|) that 


linr 4 = 0 , 

k—> oo 


(33) 


for k G k. Now, using (H2), ( fl7] ) , and ||< 7 fc|| > s, the condition (23) holds, for k G k. This, Lemma|2] and 
([33| lead to 


Vk - 1 | = 


fk~ f(xk + dk) 1 


fk - f{x k + d k ) - {q k { 0 ) - qk(d k )) 

5fc(0) - qk(dk) 


q k ( 0 ) - qk(d k ) 


< 


0(H4|| 2 ) 


< 


0{ft) 


/3e min{4,e/M} /3eS k 
Thus, for a sufficiently large k + 1 G k, we get 

fk - f(xk + d k ) > fii(qk(0) - q k (d k )) 


—> 0 (k — > oo, k G k). 


> Ppi\\9k\\ “in |4,J|rjr j ^ min I* 5 *’ JJ } ' 


The condition (33) implies that 5 k < e/M. Hence, for a sufficiently large k G k, we obtain 

4 < ~s—(fk - fk+l)- 
PMi 


Then (18) and (35) imply 

Vi — l Ti-l ^ 

< 55 11^' ~ ^ 55 4 < (/ti - fri) < - fri ), 


\\x ti -X r _ _ 

j€K,j=ti j£K,j=ti 

for a sufficiently large i. Now, Corollary 8 implies 


/3pi 


0 < lim \\x ti - x ri \\ < lim -— (T ti - f r .) = 0, 

2—>■ OO 2—>■ OO fjfl\ 


leading to 

Since the gradient is continuous, we get 


lim ||a;t i — x Ti || = 0 . 

2 — yoo 1 


(34) 


(35) 


(36) 


lim Wdu — SVj || = 0. (37) 

2 —KX> 

In view of the definitions of {U} and {rj}, it is impossible, guaranteeing \\g ti — g ri || > £. Therefore, there 
is no subsequence that satisfies © giving the result. 

To observe there is no limit point of the sequence {x k } to be a local maximizer of /, see |27| . □ 

The next result gives the global convergence of the sequence generated by Algorithm 2 to second-order 
stationary points. To this end, similar to HE an additional assumption is needed: 

(H4) If A m i n ( 14 ) represents the smallest eigenvalue of the symmetric matrix B k , then there exists a 
positive scalar C 3 such that 


7fc(0) Qk(d k ) ^ ^Amin {ddk)S . 
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Theorem 12 Suppose that f is twice continuously differentiable and also suppose that (Hl)-(Hf) hold. 
Then there exists a limit point x* of the sequence {xk} generated by Algorithm 2 such that V 2 f(x*) is 
positive semideftnite. 

Proof The proof is similar to Theorem 3.4 in [Hj. □ 


The next two results show that Algorithm 2 can be reduced to quasi-Newton or Newton methods, 
where the sequence {xk} generated by these schemes can attain local superlinear and quadratic conver¬ 
gence rates under some conditions, respectively. 


Theorem 13 Suppose that (H1)-(H3) hold, and also suppose that the sequence {xfc} is generated by 
Algorithm 2 converges to x*, ||4|| = || — Bp 1 g k \\ < 5k, H(x) = V 2 /(x) is continuous in a neighborhood 
N(x*,e) of x*, and Bk satisfies 


lim 

k—> oo 


\\[B k -H(x*)]d k \\ 

l|4|| 


= 0 . 


(38) 


then 

(i) there exists a constant ki such that for all k>k\ we have x k +i = Xk + dk', 

(ii) the sequence {x/c} generated by Algorithm 2 converges to x* superlinearly. 


Proof (i) The condition (38) implies 


Hm \\gk +H(x*)d k \\ = Q 


k —^oo 


leading to 
This implies that 


II4II 

d k = -H(x*)- 1 g k +o(\\dk\\). 

I|4|| < \\H(x*)~ 1 \\ Ikll+o(||4||). 


Theorem 11 implies that ||gfc|| —► 0, as k —)• oo. This and (40) give 


lim 

k —^oo 


= 0. 


This, (18), and (H2) imply 
Vk - 1 | = 


fk — f{x k + dk) 


fk ~ f(xk + 4) - (<7fc(0) - qk(d k )) 

9fc(0)-9fc(4) 


9fc(0) - ?fc(4) 


< 


0(||4|| s 


< 


0(II4|| 2 ) 


pe min{4,e/M} pe min{||4|| fe ,e/M} 


—1 0 (k —> oo). 


(39) 


(40) 

(41) 


This clearly implies that there exists a positive integer k\ such that for k > k\ we have Xk +i = Xk + dk- 
(ii) From dk = —Bp 1 gk, we obtain 


\\gk + Hkdk\\ \\\Hk~Bk\d k \\ ^\\{H k -H(x*)\dk\\ \\[B k -H(x*)]d k \\ 

II4II II4II - II4II + II4II 


This and (30) lead to 


r \\9k + Hkd k \\ 

hm -n-—- = 0. 

fc->oo \\dk\\ 


(42) 


Now Theorem 3.6 in |3]| implies that {aifc} generated by Algorithm 2 converges to x* superlinearly. □ 


Notice that if / is thrice continuously differentiable and the upper level set L(xq) is bounded, then 
(HI) implies that ||V 3 /(x)|| is uniformly continuous and bounded on the open bounded convex set PI 
involving Hence, by using the mean value theorem, there exists a constant L > 0 such that 

||V 3 /(x)|| < L implying 

\\H{x) — H(y)\\ < L\\x — y\\, (43) 

for all x,y £ 17. This implies that Hessian of / is Lipschitz continuous. This condition can guarantee the 
quadratic convergence of the sequence {x k } generated by Algorithm 2. The details are summarized in 
the next result. 
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Theorem 14 S uppo se that f{x) is a twice continuously differentiable function on K™, and all assump- 

and there exists a neighborhood N(x*,e) of x* 
there exists L such that 


tions of Theorem 11 hold. If ||<4|| = || — H k gk\\ < 5k, 
such that H(x ) is Lipschitz continuous on N(x*,e), i.e., 


\\H(x) -H(y)\\ < L\\x-y\\, 


(44) 


then 

(i) there exists a constant k 2 such that for all k > k 2 we have Xk+\ = Xk + dk', 

(ii) the sequence {a;/,-} generated by Algorithm 2 converges to x* quadratically. 


Proof (i) By replacing Bk by Hk in Theorem 13 we obtain that there exists an integer k 2 > 0 such that 


*Efc+l Hk 3k, 


for all k>k\. 

(ii) The condition described in (i) and Theorem 3.5 in [35] give the results. 


□ 


4 Numerical experiments 


In this section we report numerical results for Algorithm 2 equipped with two novel nonmonotone terms 
proposed in Section 2 for solving unconstrained optimization problems. In our experiments we use several 
version of Algorithm 2 employing state-of-the-art nonmonotone terms. In details, we consider 

• NMTR-G: Algorithm 2 with the nonmonotone term of Grippo et al. B3; 

• NMTR-H: Algorithm 2 with the nonmonotone term of Zhang & Hager [151 : 

• NMTR-N: Algorithm 2 with the nonmonotone term of Amini et al. [7|; 

• NMTR-M: Algorithm 2 with the nonmonotone term of Ahookhosh et al. [3]; 

• NMTR-1: Algorithm 2 with the nonmonotone term (14); 

• NMTR-2: Algorithm 2 with the nonmonotone term (151. 

In the experiments we used 112 test problems of the CUTEst test collections [H] from dimension 2 to 
5000, where we ignore test problems with the dimension greater than 5000. All of the codes are written 
in MATLAB using the same subroutine, and they are tested on 2Hz core i5 processor laptop with 4GB 
of RAM with the double-precision data type. The initial points are standard ones proposed in CUTEst. 
All the algorithms use the radius 


ci||<4|| if Vk < Hi, 

4+i = { 4 if Mi < 4 < A*2, 

max{4,c 2 ||<4||} if 4 > p 2 , 


where 


Pi = 0.05, p 2 = 0.9, ci = 0.25, c 2 = 2.5, 4 = 0.1||g , A:||, 
see [26] . In the model qk Q, an approximation for Hessian is generated by the BFGS updating formula 

Hk^k^k^k 


R _ R i dkVl 

&k+ 1 — J3k H- t - 


S kVk S k B k S k 


where Sk = Xk+i — Xk and yk = gk+i — <7fc- For NMTR-G, NMTR-N, NMTR-1 and NMTR-2, we set 
N = 10. As discussed in 46], NMLS-H uses pk = 0.85. On the basis of our experiments, we update the 
parameter rjk by 

( Vo/2 if k = 1, 

Vk = < 

[ (Vk-i + Vk-i)/2 if k > 2, 

for NMTR-N, NMTR-M, NMTR-1 and NMTR-2, where the parameter r/ 0 will be tuned to get a better 
performance. To solve the quadratic subproblem we use the Steihaug-Toint scheme [TS] (Chapter 7, 
Page 205) where the scheme is terminated if 


\\g{x k +d)|| < minjl/10, ||ff fe || 1/2 } |M| or ||d|| = 4- 
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In our experiments the algorithms are stopped whenever the total number of iterations exceeds 10000 or 


llfffcll < e 


(45) 


holds with the accuracy parameter e = 10~ 5 . 

To compare the results appropriately, we use the performance profiles of Dolan & More in [IS] , 
where the measures of performance are the number of iterations (iVj), the number of function evaluations 
( Nf ), and the number of gradient evaluations {N g ). In the algorithms considered, the number of iterations 
and gradient evaluations are the same, so we only consider the performance of gradients. It is believed 
that the computational cost of a gradient is as much as the computational cost three function values, i.e., 
we further consider the measure Nf + 3 N g . The performance of each code is measured by considering the 
ratio of its computational outcome versus the best numerical outcome of all codes. This profile offers a 
tool for comparing the performance of iterative schemes in a statistical structure. Let S be a set of all 
algorithms and V be a set of test problems. For each problem p and solver s, t P)S is the computational 
outcome regarding to the performance index, which is used in defining the performance ratio 


Tp,s 


tp,s 

min{f PjS : s £ 5} 


(46) 


If an algorithm s is failed to solve a problem p , the procedure sets r PjS = rf a ;i e d, where rf a ;i e d should be 
strictly larger than any performance ratio (46). For any factor r, the overall performance of an algorithm 
s is given by 

Ps(t) = — size{p £ V : r PtS < r}. 


In fact Ps(t) is the probability that a performance ratio r ps of the algorithm s £ S is within a factor 
r £ R” of the best possible ratio. The function p s (r) is a distribution function for the performance ratio. 
In particular, p s { 1) gives the probability that the algorithm s wins over all other considered algorithms, 
and lim r ^ rfailed p s (r) gives the probability of that the algorithm s solve all considered problems. Hence 
the performance profile can be considered as a measure of efficiency for comparing iterative schemes. In 
Figures [3] and |4j the x-axis shows the number r while the y-axis inhibits P{r p s < r : 1 < s < n s ). 


4.1 Experiments with highly nonlinear problems 

In this subsection we give some numerical results regarding the implementation of NMTR-1 and NMTR-2 
compared with TTR on some two-dimensional highly nonlinear problems involving a curved narrow valley. 
More precisely, we consider the Nesterov-Chebysheve-Rosenbrock, Maratos, and NONDIA functions, see, 
for example, [5]. In Example 1 the Nesterov-Chebysheve-Rosenbrock function is given, and the Maratos 
and NONDIA functions are given by 

f(x 1 , X 2 ) = X\ + 9i(x\ + x\ — l) 2 (Maratos function) 

and 

f(x 1 , 12 ) = (1 — X 2) 2 + O 2 (x\ — x\) 2 (NONDIA function), 

respectively, where we consider 9\ = 10 and 62 = 100. 

We solve the problem (jTJ) for these three functions using TTR, NMTR-1, and NMTR-2, and the 
results regarding the number of iterations and function evaluations are summarized in Table |T] To give a 
clear view of the behaviour of TTR, NMTR-1, and NMTR-2, we depict the contour plot of the considered 
functions and iterations obtained by the algorithms in Figure[2](a), (c), and (e). In all three cases, one can 
see that NMTR-1 and NMTR-2 need less iterations and function values compared with TTR to solve the 
problem. Moreover, TTR behaves monotonically and follows the bottom of the associated valley, while 
NMTR-1 and NMTR-2 fluctuated in the valley. 
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(a) Nes-Cheb-Rosen contour plot Sz iterations (b) function values versus iterations 




(c) Maratos contour plot Sz iterations (d) function values versus iterations 




(e) NONDIA contour plot Sz iterations (f) function values versus iterations 


Fig. 2: A comparison among NMTR-1, MNTR-2, and TTR: Subfigures (a), (c), and (e) respectively illus¬ 
trate the contour plots of the two-dimensional Nesterov-Chebysheve-Rosenbrock, Maratos, and NONDIA 
functions and iterations of NMTR-1, MNTR-2, and TTR; Subfigures (b), (d), and (f) show the diagram 
of function values versus iterations. 
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Table 1. Numerical results for highly nonlinear problems 


Problem name 

Dim 

Initial point 

TTR 


NMTR-1 

NMTR-2 

N g 

N f 

N g 

N f 

N g 

Nf 

Nes-Cheb-Rosen 

2 

(-1, 1.5) 

32 

41 

27 

34 

22 

29 

Maratos 

2 

(1, 0.95) 

31 

40 

24 

29 

22 

29 

NONDIA 

2 

(-0.9, 1.17) 

24 

34 

27 

34 

11 

17 


4.2 Experiments with CUTEst test problems 

In this subsection we give numerical results regarding experiments with NMTR-1 and NMTR-2 on the 
CUTEst test problems compared with NMTR-G, NMTR-H, NMTR-N, and NMTR-M. 

To get a better performance from NMTR-1 and NMTR-2, we tune the parameter 770 by testing several 
fixed values of 770 for both algorithms, where we use 770 = 0.15,0.25, 0.35, 0.45. The corresponding versions 
of the algorithms NMTR-1 and NMTR-2 are denoted by NMTR-1-0.15, NMTR-1-0.25, NMTR-1-0.35, 
NMTR-1-0.45, NMTR-2-0.15, NMTR-2-0.25, NMTR-2-0.35, and NMTR-2-0.45, respectively. The results 
are summarized in Figure [3] for three measures: the number function evaluations; the number gradient 
evaluations; the mixed measure Nf + 3 N g . In Figure [3j subfigures (a), (c) and (e) illustrate that the 
results of NMTR-1, where it produces the best results with 770 = 0.25. From subfigures (b), (d), and (f) 
of Figure [3j it can be seen that the best results are produced by 770 = 0.45. Hence for NMTR-1 we use 
770 = 0.25 and for NMTR-2 use 770 = 0.45 in the remainder of our experiments. 

We here test NMTR-G, NMTR-H, NMTR-N, NMTR-M, NMTR-1, and NMTR-2 for solving the 
unconstrained problem ([!]) and compare the produced results. The results of our implementations are 
summarized in Table [2j where N g and Nf are reported. The results of Table [2] show that NMTR-1 has a 
competitive performance compared with NMTR-G, NMTR-H, NMTR-N, NMTR-M, however, NMTR-2 
produces the best results. To have a better comparison among these algorithms, we illustrate the results 
in Figure [4] by performance profiles for the measures N g , Nf, and Nf + 3 N g . 

In Figure [4j Subfigure (a) displays for the number of gradient evaluations, where the best results 
attained by NMTR-2 and then by NMTR-N with about 63% and 52% of the most wins, respectively. 
NMTR-1 is comparable with NMTR-G, NMTR-H, NMTR-N, but its diagram grows up faster than the 
others, which means its performance is close to the performance of the best method NMTR-2. Subhgure 
(b) shows for the number of function evaluations and has a similar interpretation of Subfigure (a), however, 
NMTR-2 attains about 60% of the most wins. In Figure [4j Subfigures (c) and (d) display for the mixed 
measure Nf + 3 N g with r = 1.5 and r = 5.5, respectively. In this case NMTR-2 outperforms the others 
by attaining about 58% of the most wins, and the others have comparable results, however, the diagrams 
of NMTR-1 and NMTR-M grow up faster than the others implying that they perform close to the best 
algorithm NMTR-2. 


5 Concluding remarks 

In this paper we give some motivation for employing nonmonotone strategies in trust-region frameworks. 
Then we introduce two new nonmonotone terms and combine them into the traditional trust-region 
framework. It is shown that the proposed methods are golbally convergent to first- and second-order 
stationary points. Moreover local superlinear and quadratic convergence are established. Applying these 
methods on some highly nonlinear test problems involving a curved narrow valley show that they have 
a promising behaviour compared with the monotone trust-region method. Numerical experiments on a 
set of test problems from the CUTEst test collection show the efficiency of the proposed nonmonotone 
methods. 

Further research can be done in several aspects. For example, by combining the proposed nonmonotone 
trust-region methods with various adaptive radius, more efficient trust-region schemes can be derived, 
see, for example, M- The combination of the proposed nonmonotone terms with several inexact line 
searches such as Armijo, Wolfe, and Goldstein is also interesting, see 0. The extension of the proposed 
method for constrained nonlinear optimization could be interesting, especially for nonnegativity con¬ 
straints and box constraints, see, for example, It also could be interesting to employ 
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(a) Ni and N g performance profile (NMTR-1) 


(b) Ni and N g performance profile (NMTR-2) 




(c) Nf performance profile (NMTR-1) 


(d) Nf performance profile (NMTR-2) 




(e) Nf + 3N g performance profile (NMTR-1) (f) Nf + 3 N g performance profile (NMTR-2) 

Fig. 3: Performance profiles of NMTR-1 and NMTR-2 with the performance measures N g , Nf, and 
Nf + 3 N g \ Subfigures (a) and (b) display the number of iterations {Nf) or gradient evaluations (N g ); 
Subfigures (c) and (d) show the number of function evaluations (Nf); Subfigures (e) and (f) display the 
hybrid measure Nf + 3 N g . 



























































Two globally convergent nonmonotone trust-region methods for unconstrained optimization 


17 




(a) Ni and N g performance profile (b) Nf performance profile 




X X 

(c) Nf + SNg performance profile (r = 1.5) (d) Nf + SN g performance profile (r = 5.5) 


Fig. 4: A comparison among NMTR-G, NMTR-H, NMTR-N, NMTR-M, NMTR-1, and NMTR-2 by 
performance profiles using the measures N g , Nf, and Nf + 3 N g : Subfigure (a) displays the number of 
iterations (iV,) or gradient evaluations (N g ); Subfigure (b) shows the number of function evaluations 
(Nf); Subfigures (c) and (d) display the hybrid measure Nf +3N g with r = 1.5 and r = 5.5, respectively. 


nonmonotone schemes for solving nonlinear least squares and system of nonlinear equations, see [5] and 
references therein. Moreover, investigating new adaptive formulas for the parameter r/k can be precious 
to improve the computational efficiency. 


Appendix. Table [2] 














































































Table 2. Numerical results for nonmonotone trust-region methods 
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